Dependency-based Sentence Simplification for Increasing Deep Lfg Parsing Coverage
نویسندگان
چکیده
Large scale deep grammars can achieve high coverage of corpus data, yet cannot produce full-fledged solutions for each sentence. In this paper, we present a dependency-based sentence simplification approach to obtain full parses of simplified sentences that failed to have a complete analysis in their original form. In order to remove the erroneous parts that cause failure, we delete phrases from failed sentences by utilising their dependency structure, and reprocess the remaining shorter sentences with XLE to get full analyses. We ensure the grammaticality and preserve the core argument structure of simplified sentences by defining the deletion scheme only on a set of modifier phrases. We apply our approach on German data and retrieve full parses of simplified sentences for 52.37% of the failed TIGER sentences. With the combination of original and simplified sentences, the full XLE parses derived from the TIGER Treebank increases from 80.66% to 90.79%.
منابع مشابه
Treebank-Based Acquisition of Chinese LFG Resources for Parsing and Generation
This thesis describes a treebank-based approach to automatically acquire robust, wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena an...
متن کاملDependency-Based Sentence Simplification for Large-Scale LFG Parsing: Selecting Simplified Candidates for Efficiency and Coverage
Large scale LFG grammars achieve high coverages on corpus data, yet can fail to give a full analysis for each sentence. One approach proposed to gain at least the argument structure of those failed sentences is to simplify them by deleting subtrees from their dependency structure (provided by a more robust statistical dependency parser). The simplified versions are then re-parsed to receive a f...
متن کاملTreebank-Based Acquisition of Multilingual Unification Grammar Resources
Deep unification(constraint-)based grammars are usually hand-crafted. Scaling such grammars from fragments to unrestricted text is time-consuming and expensive. This problem can be exacerbated in multilingual broad-coverage grammar development scenarios. Cahill et al. (2002, 2004) and O’Donovan et al. (2004) present an automatic f-structure annotation-based methodology to acquire broad-coverage...
متن کاملDependency Parsing Resources for French: Converting Acquired Lexical Functional Grammar F-Structure Annotations and Parsing F-Structures Directly
Recent years have seen considerable success in the generation of automatically obtained wide-coverage deep grammars for natural language processing, given reliable and large CFG-like treebanks. For research within Lexical Functional Grammar framework, these deep grammars are typically based on an extended PCFG parsing scheme from which dependencies are extracted. However, increasing success in ...
متن کاملLFG without C-structures
We explore the use of two dependency parsers, Malt and MST, in a Lexical Functional Grammar parsing pipeline. We compare this to the traditional LFG parsing pipeline which uses constituency parsers. We train the dependency parsers not on classical LFG f-structures but rather on modified dependency-tree versions of these in which all words in the input sentence are represented and multiple heads...
متن کامل